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ABSTRACT 


Distributed denial of service (DDoS) attacks have always been a concern of 
cyber experts. To detect DDoS attacks, several methods can be used. One of 
the methods used in this research is the n-gram technique. The n-gram 
approach analyzes the payload of data packets that enter the network to 
obtain attack patterns. Data is captured and analyzed, after which it is 


compared with clean data packets. The chi-square distance value close to 1 
indicates that the two packages are very similar so that the data packet is not 
Keywords: an attack. A deal less than one means the data packet is categorized as an 
attack. In this research, the threshold for determining the attack level can be 


Chi-square distance lowered to obtain a very high detection accuracy. As a result, the 2-gram 


DDoS technique has a detection accuracy rate with the lowest false positive value 

Malware of around 13%, with the highest actual positive ratio reaching 99.98%. 

N-grams 
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1. INTRODUCTION 

Distributed denial of service (DDoS) attacks exploit the internet to target critical Web services. This 
attack aims to prevent unauthorized users from accessing specific network resources or downgrading 
standard services of legitimate services by sending massive unwanted traffic to the victim (machines or 
networks) to exhaust connection capacity or bandwidth. This increased flow of DoS attacks has put servers 
and network devices on the internet at greater risk. The DDoS attack has been for several years. Previous 
single source attacks are currently countered simply by several defense mechanisms, and therefore the source 
of those attacks will be rejected or blocked by improved tracing capabilities. However, with the incredible 
growth of the internet lately, many systems are currently vulnerable to attackers. Attackers will now use a 
vast range of hosts to launch attacks on the server. Until now, there is not an accurate and fast technique to 
overcome them [1]. Personal data or company data need to be protected. In securing data, companies usually 
use cloud systems or create their servers to store data. But when the computer is always connected to the 
internet, there are also weaknesses. This weakness usually lies in the server's ability to serve all kinds of data 
access. There are so many requests for data access to the server that it is difficult to control. Thus, an attacker 
intentionally or unintentionally tries to bring down the server's defense system. With these conditions, it is 
costly to recover them. Companies usually use intrusion detection systems (IDS) as a line of defense to 
protect the system from being attacked. The IDS system guarantees that a server protected by IDS certainly 
has a high level of security because it helps the firewall work. Attacks that fail to be detected by the firewall 
will filter again on ID [2]. The usual type of attack on the server is the DDoS attack. Based on data obtained, 
the cybersecurity research and company Kaspersky noted that cyberattacks in DDoS in the first quarter (Q1) 
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2020 increased sharply compared to the same period in the previous year. The surge in cyber-DDoS attacks 
occurred on sites related to education and cities. DDoS is a cyber-attack carried out by flooding internet 
network traffic on servers, systems, or networks. Typically, this attack uses multiple host computers until the 
target computer cannot be accessed. DDoS itself is a viral attack used by hackers. Kaspersky explained the 
surge in DDoS attacks in the first three months of 2020 could be because hackers took advantage of 
situations when people had to do activities at home and were very dependent on digital resources. The 
coronavirus pandemic [3], which began in the first quarter of 2020, has caused almost all activities, be it 
studying, working, or relaxing, to shift online. The increasing demand for online resources is well known to 
cyber attackers, who carry out attacks on the most vital or increasingly popular digital services. The US 
government's Department of Health and Human Services, several hospitals in Paris, and online game servers 
are examples of DDoS attacks in February and March. In the DDoS attack report in Q1 2020, Kaspersky also 
revealed significant growth in attacks on educational resources and the city's official website. In Q1 2020, 
this number tripled compared to the same period in 2019. The share of such attacks amounted to 19 percent 
of the total number of incidents in Q1 2020. Kaspersky experts estimate that the increased interest in 
attackers was because people became more dependent on stable and accessible online resources during the 
pandemic. If people are increasingly upset about a pandemic and can take preventative action, they might go 
to an official source of information for the more secure guidance. Many schools and universities have also 
switched to online learning. In general, the total number of DDoS attacks in Q1 2020 has indeed increased. 
During this period, Kaspersky DDoS Protection detected and blocked twice the number of attacks compared 
to Q4 2019, and was 80 percent more than Q1 2019. The average duration of attacks also grew in Q1 2020; 
DDoS attacks lasted 25 percent longer than Q1 2019. To anticipate the attack, a method or method is needed 
to detect DDoS attacks. According to researchers, to detect attacks needed a method or algorithm, as well as 
research conducted by [4]; From a survey, it is found that Naive Bayes (NB) algorithm provides faster 
learning/training speed than other machine learning algorithms. Also, it has more accuracy in the 
classification and detection of attacks. So we are proposing a network intrusion detection system (IDS) that 
uses a machine learning approach with the help of the NB algorithm. In contrast, the research conducted 
by [4] the deep learning methodology supported Gaussian-Bernoulli sort restricted Ludwig Boltzmann 
machine (RBM) to the detection of denial of service (DoS) attacks is taken into account to extend the DoS 
attack detection accuracy, seven extra layers area unit additional between the visible and also the hidden 
layers of the RBM. right, end up in DoS attack detection area unit obtained by optimization of the 
hyperparameters of the planned deep RBM model. the shape of the RBM that permits application of the 
continual information is employed. To detect the recent DDoS attack, a researcher [5] said that it needs 
special treatment at the application layer, such as the hypertext transfer protocol (HTTP), which must be 
protected. In this protocol, some commands must be considered, such as HTTP requests. 


2. RESEARCH METHOD 

Haris et al. [6], projected an excellent defensive system called the web to protect consumer hosts, 
network routers, and network servers from turning into victims, zombies, and handlers of DDoS flood 
attacks. It covers any IP-based public network on the internet and uses preventive and rate-limiting to 
eliminate system vulnerabilities on-track machines. It enforces dynamic security policies for 
safeguarding network resources against DDoS flood attacks. DDoS instrumentation as a comprehensive 
framework for DDoS attack detection. It uses a network-based detection technique to defend advanced and 
leisurely styles of DDoS attacks and works in parallel to examine and manage progress traffic in the period. 
It covers stateful scrutiny on traffic flow streams and correlates actions among different sessions by 
continuously observing each DDoS attack and legit applications. It terminates the session when it detects a 
DDoS attack. Several methods are used to detect DDoS attacks, namely statistical, knowledgebase, soft 
computing, data mining, and machine learning. 

For this research, there are four techniques: statistical-based, knowledge-based, software computing, 
and machine learning, which in this study is called heuristic detection. This technique was chosen because a 
payload being analyzed needs to be diagnosed early, such as the number of incoming data packets every 
second (statistical), incoming packets are standard packets or not (knowledge-based), and to determine 
whether the level of accuracy of detection of a malicious packet is whether or not it is tested with datasets 
that are already available on the internet (data mining and machine learning). Details of the techniques used 
in this study can be seen in Figure 1. 


2.1. Research framework 


To facilitate research, a framework is needed, which explains in detail the process being carried out. 
The dataset in this study was taken from the Canadian Institute for Cybersecurity (CICIDS2017) and A 
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management information base (MIB2016) was obtained by capturing packet data using Wireshark, then 
extracted the data into several features. The packet data are analyzed one by one using the n-gram technique 
to separate standard packets and packet attacks. N-gram technique can be done using a data payload on a data 
packet. The research flow in detecting DDoS attacks can be seen in Figure 2. 


DDoS Detection and Mitigation Methods 





Statistical Knowledge-based Soft computing Machine Learning 


Figure 1. DDoS detection and mitigation methods [7] 
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Figure 2. Research framework [8], [9] 


The general payload structure is as follows: the data payload is in hexadecimal separated one by one 
in the form of strings. The formed line will produce a gram sequence starting from 2-gram, 3-gram, 4-gram, 
5-gram, and 6-gram. Each gram sequence will determine the number of occurrences of the string (frequency). 
This applies to two datasets at once, whether it is an attack dataset or a normal (Benign) dataset. It can be 
called a surveyed package or instruction, or reference package. The number of the regular packet is 529, 918 
records. This packet is a comparison to find out the status of the packet being surveyed. To find out the 
comparative value between packets studied with standard packets, the n-gram technique is used. The number 
of strings appearing in the n-gram technique aims to calculate the distance in each gram, in this study called 
chi-square distance. Chi-square distance is a method of finding the distance between two histograms (X and 
Y). From these calculations, the level of similarity of the packages surveyed with normal packages that have 
been formed previously can be seen. 
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2.2. Dataset 

The selection of the dataset in this study is essential to be able to detect DDoS attacks. To do testing 
and validation, a dataset is needed so that this research has a maximum contribution. The dataset used in this 
study was taken from the simple network management protocol-management information base (SNMP- 
MIB2016) and the Canadian Institute for Cybersecurity (CICIDS2017) protocol and datasets [10]-[12]. This 
dataset contains two categories of data types, namely normal data and data containing attacks. The data 
collection period starts on Monday, November 2019, from 9 am to 5 pm in a row for five days. For the first 
day, the data collected is only in the form of normal data. At the same time, the next day contains data from 
the brute force file transfer protocol (FTP), brute force secure shell (SSH), DoS, Heartbleed, web attack, 
infiltration, botnet, and DDoS attacks. Applications used to carry out DoS attacks are Slowloris DoS, DoS 
slowhttptest, DoS Hulk, and GoldenEye. Meanwhile, a DDoS attack tool call hammer master is used to produce 
a new dataset called H2NPyalod with a total of 1954 records in the dataset. The accuracy of the new dataset was 
evaluated to be compared (CICIDS2017) and (SNMP-MIB2016) dataset with 3 algorithms k-nearest neighbor 
(kKNN), neural, and support vector machines (SVM) [10], [13]. 


2.3. Feature extraction 

Feature extraction is a technique of taking a feature or feature from a form that later the value obtained 
will be analyzed for the next process [14]. The features that will be extracted in this study are taken from the 
results of data packet captured using Wireshark by selecting network protocols such as hypertext transfer 
protocol (HTTP) and transfer control protocol (TCP), which in this study were taken from 2 datasets available 
on the internet and one data packet that was captured directly as shown in Figure 3. From Figure 3 [15], the next 
step is to extract features to get the payload [6] or data in hexadecimal form. Data in hexadecimal form is 
separated based on data frames with what has been standardized in the TCP/internet protocol (IP) protocol; 
then, the payload is broken down into strings in this study called n-grams [16], [17]. 














M HTTP_Packet DDoS.pcap = x 
File Edit View Go Capture Analyze Statistics Telephony Wireless Tools Help 
anio UR RE Cersztsehaaan 
(F |http xX ~ | Expression... + 
No. Time Source Destination Protocol Length Info A 
F 1 0.000000 147.32.84.165 74.125.232.195 HTTP 447 GET /service/check2?appid=%7B430FD4D0-B729-4F61-AA34-915264817990D%7D&appversion=1.3.21___ 
3 0.033701 74.125.232.195 147.32.84.165 HTTP 131 HTTP/1.1 204 No Content 
10 129.422344 147.32.84.165 94.63.149.152 HTTP 144 GET /rus.php HTTP/1.0 
19 129.567214 94.63.149.152 147.32.84.165 HTTP 3944 HTTP/1.1 200 OK 
20 132.015968 147.32.84.165 60.190.223.75 HTTP 155 GET /p/out/kp.exe HTTP/1.0 
46 133.951690 60.190.223.75 147.32.84.165 HTTP 3587 HTTP/1.1 200 OK 
47 136.134607 147.32.84.165 60.190.223.75 HTTP 369 GET /list.php?c=B4AC885F94224AE6G4DAACG6F60346C27CD049B58C0B2469F2DC9ECA825FF9F6D9DFE10E 
5@ 136.469992 60.190.223.75 147.32.84.165 HTTP 1590 HTTP/1.1 200 OK (text/html) || 
51 137.067130 147.32.84.165 195.88.191.59 HTTP 139 GET /kx4.txt HTTP/1.@ = 
53 137.652667 147.32.84.165 122.224.6.164 HTTP 239 GET /hn.gif?t=0.9630548 HTTP/1.0 — 
108 140.589699 122.224.6.164 147.32.84.165 HTTP 3277 HTTP/1.1 200 OK (image/gif) — 
12@ 145.236679 147.32.84.165 60.190.223.75 HTTP 531 GET /sn.php?c=ACB245A1AE3E4595C12B7C54ECA526058B41122488AA75E96F0B2C6525FE1C32430E9873 
130 145.572667 60.190.223.75 147.32.84.165 HTTP 60 HTTP/1.1 200 OK (text/html) = 
146 146.541076 195.88.191.59 147.32.84.165 HTTP 899 HTTP/1.1 200 OK (text/plain) 
147 146.590064 147.32.84.165 60.190.223.75 HTTP 539 GET /sn.php?c=190722C6118115C5C42EF8D0450C936039F30A3CFBD9EE72F490B5FCAB700825571A9D76 
151 146.953973 60.190.223.75 147.32.84.165 HTTP 60 HTTP/1.1 200 OK (text/html) == 
152 147.542781 147.32.84.165 61.147.99.179 HTTP 239 GET /gggg_r.jpg?t=0.4343531 HTTP/1.0 i] 
154 147.546889 147.32.84.165 94.63.150.20 HTTP 355 GET /list.php?c=B4AC885F94224AE6G4DAAC6F60346C27CD049858C0B2469F2DC9ECA825FF9F6D9DFE10E — 
162 147.873711 94.63.150.20 147.32.84.165 HTTP 2234 HTTP/1.1 200 OK (text/html) 
174 149.119015 147.32.84.165 195.88.191.59 HTTP 238 GET /temp/3425.exe?t=0.3419458 HTTP/1.0 = 
180 149.347656 61.147.99.179 147.32.84.165 HTTP 1590 HTTP/1.1 200 OK (image/jpeg) -— 
185 150.567275 147.32.84.165 60.190.223.75 HTTP 535 GET / sn. php?c=ABB5F2168D1D5DFA39D232684D06DA1FC969A491CEE815892111CA8316BF4A6793DF779D__| < 
: AS ae eS Sener AEE on Ee E e ee ae = 
Frame 1: 447 bytes on wire (3576 bits), 447 bytes captured (3576 bits) 
Ethernet II, Src: PcsCompu_b5:b7:19 (08:00:27:b5:b7:19), Dst: Cisco_db:19:c3 (00:1e:49:db:19:c3) 
Internet Protocol Version 4, Src: 147.32.84.165, Dst: 74.125.232.195 
Transmission Control Protocol, Src Port: 1027, Dst Port: 80, Seq: 1, Ack: 1, Len: 393 
@@ le 49 db 19 c3 08 @@ 27 b5 b7 19 08 00 45 00 I ' E A 
v 








@ 7 Hypertext Transfer Protocol: Protocol Packets: 6525 * Displayed: 2657 (40.7%) Profile: Default 


Figure 3. Capture and extract feature 


2.4. Payload selection 

The package payload used in this study is as follows: 192.168.10.15—13.107.4.50 HTTP GET 
/c/msdownload/update/software/defu/2017/07/am_delta_S5d55c62f8e8ale375 | fa8dcf66647bb33ae5b343.exe 
HTTP/1.1. 
— Payload Observed 
00c1b114eb31001e4fd4ca28080045000 1 6c352a40008006e80dc0a80a0f0d6b0432c12b0050a4£896d828237a 
ce5018010249620000474554202f632f6d73646f7 76e6c6f6 1642£7570646174652f736f6674776172652f64656 
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6752f323031372f30372f616d5f64656c74615f35643535633632663865386131653337353166613864636636 
3636343762623333616535623334332e65786520485454502f3 1 2e3 10d0a436f6e6e656374696f6e3 M4b656 
5702d416c6976650d0a4163636570743%02a2f2a0d0a4163636570742d456e636f64696e673%006964656e74 
6974790d0a49662d556e6d6f6469666965642d53696e63653 05 765642c203035204a756c2032303 13720313 
13a32333a343820474d540d0a526 16¢67653—B062797465733d3 1 343738382d32343537350d0a557365722d4 
16765 6e7436004d6963726f7 36f667420424954532f372e380d0a486f73743 a206 1752e646f776e6c6f61642e7 
7696e646f77737570646174652e636f6d0d0a0d0a 

The payload of the structure above is marked in red. The payload is analyzed using n-gram to 
determine whether a packet is dangerous or not. We then compared with normal packets as marked in the 
following payload. 
— Payload expected 
00c1b114eb3 1 b8ac6f1d1f6c08004500008900e440008006c7dac0a80a0945 1 £2 1e0042 100507990c654a57 1 0F 
3d50180100d4730000474554202f6e6373692e74787420485454502f3 12e3 10d0a436f6e6e656 374696 f6e3800 
436c6f7 3650d0a557365722d4 167656e7436004d6963726f7 36f6674204e4353490d0a48 6f7 37438Q07777772¢e 
6d7366746e6373692e63 6f6d0d0a0d0a 

The sample payload above was analyzed using 3-Gram, with a pattern found between the two 
payloads that were compared "a20", the analyzed payload was found to be seven patterns. In contrast, the 
comparison payload was found as many as three patterns. To determine whether a payload is dangerous or 
not, the chi Square Distance parameter is used, which the results of the analysis in this study are described in 
full in Table 1 to Table 5. 


2.5. N-gram technique 

N-gram is a sequence of n items from a series of texts or words. This series can be anything, for 
example, letters, words, or sentences following what we want to use. To find out whether to use an attack or 
not, the following formula is used: 


Ne . 2 
D2 (X,Y) =p", 





(1) 


The value of D is used to test whether the packet analyzed is an attack or not. If the value of chi-square 
distance is smaller than the value of the chi-square table (X2) or a value greater than 0.05, then the packet is 
called an attack/DDoS. 


2.6. Algorithm classification 

To be able to measure the accuracy of DDoS detection with the n-gram technique, the machine 
learning classification method is used. The algorithm used is supervised learning which consists of KNN, 
Neural Network, and SVM [18]. The k-Nearest Neighbor algorithm is a supervised algorithm learning where 
the results of the new instance are classified according to the majority of the k-nearest neighbor categories. 
The purpose of this algorithm is to classify new objects based on attributes and samples of training data. The 
The k-nearest neighbor algorithm uses the as the predictive value of the new instance value. While the neural 
networks algorithm adopts the thinking mechanism of a system or application that resembles the human 
brain, both for processing various received element signals, tolerance for errors. The last algorithm in this 
study is SVM, a reliable method for solving problems. Data classification. The use of the SVM model 
processes data into training data and test data. The training data is used in forming the SVM model, while the 
independent parameter values are selected from the initial data. 


Algorithm kNN 


1 For each training pattern <x, f (x)>, add the pattern to the list of training patterns 


2 For a pattern, enter Xq 
° For example, xl, x2, ..., xk are k patterns that have the closest distance 
(neighbors) to xq 
° Return the class that has the most number of patterns among the k patterns as a 
decision class 


Algorithm neural network 
1 Form each pattern pi 
Wi=pi 
Form the pattern unit with the input weight vector wi 
Connect the pattern units to the summing unit for each class 
End 
Determine the constant | ck | for each adding unit 
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2 For each pattern pi 


K = class pi 
Find the distance, in, to the closest pattern to class k 
Dtot [k] = dtoto [k] + dl 
End 
For each class k 
ok = (g. Dtot [k]) / | Ck | 
End 


Algorithm SVM 


1 Initialization, ai=0 
Calculate the matrix Dij=yliyj (K(x1,xj)+A2 ) 
2 Perform the three steps below for i=1,l,.., 
e Ei= Ðl ajDij 
° Oal = min{max[Y(1-E1), -&i], C-ai} 
° gi=Q&i + OAL 
3 Return to Step 2 until ao converges 


3. RESULTS AND ANALYSIS 
3.1. Chi-square distance 

After extracting the payload data, the payloads are separated using an n-gram algorithm, starting 
from 2-grams, 3-grams, 4-grams, 5-grams, and 6-grams. Calculating the chi-square distance value for a 2- 
gram payload shows that the survey payload string has a frequency value of string occurrence. The 
percentage value is calculated, as well as for standard payload; the percentage is also calculated. After all the 
steps are done, the chi-square distance value is obtained for each gram; if the chi-square distance value is 
greater than the 0.9 thresholds, then the payload is abnormal (DDoS), as shown in Table 1. 

The survey payload has a string occurrence frequency value while calculating the chi-square 
distance value for a 3-gram payload. The percentage value is calculated, and for standard payload, the 
percentage is also calculated. After all the steps are done, the chi-square distance value is obtained for each 
gram; if the chi-square distance value is greater than the 0.9 thresholds, then the payload is abnormal 
(DDoS), as shown in Table 2. 

The survey payload has a string occurrence frequency value while calculating the chi-square 
distance value for a 4-gram payload. The percentage value is calculated, and for standard payload, the 
percentage is also calculated. After all the steps are done, the chi-square distance value is obtained for each 
gram; if the chi-square distance value is greater than the 0.9 thresholds, then the payload is abnormal 
(DDoS), as shown in Table 3. 


Table 1. 2-Gram analysis payload 


Packet Survey Reference (Normal Packet) 
2-Gram F % Chi-Square Distance 2-Gram F Percent 
e5 1 0.001322751 0.001203407 71 1 0.00332226 
bl 1 0.001322751 0.001203407 bl 1 0.00332226 
Chi-Square Distance 0.337496629 


Table 2. 3-Gram analysis payload 


Packet Survey Reference (Normal Packet) 
3-Gram F % Chi-Square distance _ 3-Gram F % 
00c 1 0.00132626 0.017435428 Ocl 1 0.020000 
735 2 0.00265252 0.002417006 0a4 2 0.006667 
a20 7  0.00397878 0.00108371 a20 3 0.006667 
Chi-Square Distance 0.796257719 


Table 3. 4-Gram analysis payload 


Packet Survey Reference (Normal Packet) 
4-Gram F % Chi-Square Distance 4-Gram F % 
00c1 1 0.00132626 0.014175074 0c10 0 0.01672241 
b114 1 0.00132626 0.001217892 b114 1 0.00334448 
Od0a 9 0.01193634 0 9 0.01193634 
Chi-Square Distance 0.887882948 
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The survey payload has a string occurrence frequency value while calculating the chi-square 
distance value for a 5-gram payload. The percentage value is calculated, and for standard payload, the 
percentage is also calculated. After all the steps are done, the chi-square distance value is obtained for each 
gram; if the chi-square distance value is greater than the 0.9 thresholds [19], then the payload is abnormal 
(DDoS), as shown in Table 4. 

The survey payload has a string occurrence frequency value while calculating the chi-square 
distance value for a 6-gram payload. The percentage value is calculated, and for standard payload, the 
percentage is also calculated. After all the steps are done, the chi-square distance value is obtained for each 
gram; if the chi-square distance value is greater than the 0.9 thresholds, then the payload is abnormal 
(DDoS), as shown in Table 5. 


Table 4. 5-Gram analysis payload 


Packet Survey Reference (Normal Packet) 
5-Gram F % Chi-Square Distance 5-Gram F % 
00c1b 1 0.00132626 0.001227357 00c1b 1 0.0033557 
Oc1b1 1 0.00132626 0.001227357 Oc1b1 1 0.0033557 
76e6c 2  0.00265252 0.0024547 13 3 ~=0.0039787 
Chi-Square Distance 0.888606111 


Table 5. 6-Gram analysis payload 


Packet Survey Reference (Normal Packet) 
6-Gram F % Chi-Square Distance _6-Gram_  % % 
OOcIb1 1 0.00132626 0.0012369 00c1b1 1 0.003367 
Oclb11 1 0.00132626 0.0012369 Oclb11 1 0.003367 
46f776 2 0.00265252 0.00247379 

Chi-Square Distance 0.92519837 


3.2. Performance comparison 

Using the n-gram technique, the number of packets detected as DDoS packages was 269, and 
standard packets were 1685. From the detection of the n-gram approach, testing the accuracy of DDoS attack 
detection using machine learning methods is carried out. Algorithms for measuring performance are kNN, 
Neural Network, and SVM, where the analysis results can be seen in the following Table 6. 

From the table, the comparison of the accuracy level of the n-gram technique in detecting DDoS 
attacks for the support vector machine (SVM) algorithm is the 1-gram detection accuracy rate of 99.00%, 
2-gram 99.23%, 3-gram 96.5%, 4-gram 97.14%, 5-gram 96.7%, 6-gram 94.07%, while the level of detection 
accuracy with the neural network algorithm 1-gram is 99.00%, 2-gram 99.98%, 3-gram 96.3%, 4-gram 95.9%, 
5-gram 95.5%, and 6-gram 93.3%, for the KNN algorithm 1-gram 98%, 2-gram 99.98%, 3-gram 96.7 %, 
4-gram 98.00%, 5-gram 97.8% and 6-gram 96.3%. When compared with other algorithms, the machine 
learning techniques or methods in this study were significantly superior. 

For the Mahalanobis distance algorithm [20]-[22], the number of grams used only focuses on 2- 
gram and 4-gram. In contrast, the chi-square distance algorithm [23] uses l-gram to 5-gram, and the 
Reputation value algorithm for the same n-gram values from 3-Gram to 6-Gram, while for research [24] only 
uses 2-gram with a different algorithm such as Cosine Similarity where the accuracy rate is up to 65%, 
Jaccard index 65% and levenshtein distance 80%. Based on the results of the study in Table 6, it is explained 
that the most dominant n-gram technique in detecting DDoS attacks is 2-Gram and 3-Gram and has the 
highest level of accuracy, reaching 99.98%. When compared with research, [25]-[27] it only got 98.7%, 
meaning that there was a significant difference reaching 1.28%. Visually it can be seen in Figure 4. 


Table 6. The performance comparison between our approach and methods in 


Accuracy N-Gram Technique Algorithm 
1-G 2-G 3-G 4-G 5-G 6-G 
94.7% 75.7% Mahalanobis distance 
99.81% 98.7% 99.00% 99.00% 99.00% Chi-Square Distance 
94.04% Reputation values 
65% Cosine Similarity 
65% Jaccard Index 
80% Levenshtein Distance 
99.00% 99.23% 96.5% 97.14% 96.7% 94.07% SVM 
99.00% 99.98% 96.3% 95.9% 95.5% 93.3% Neural Network 
98.00% 99.98% 96.7% 98.0% 97.8% 96.3% kNN 
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Figure 4. Chart accuracy detection for n-gram technique 


4. CONCLUSION 

To detect DDoS attacks in this study using n-gram technique. All data packets that flow to the 
network are captured using tools that have been created before. Applications made have used the Gram 
technique to separate normal and DDoS packets; all data packets are converted to hexadecimal, the result of 
this conversion is called the payload. All payloads in the string were analyzed using several n-gram, ranging 
from 1-gram to 6-gram. The 2-gram and 3-gram techniques have the lowest false positive accuracy rate of 
13%, with the highest actual positive ratio reaching a value of 99.98% compared to previous studies that have 
been done. 
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